17 research outputs found

    Accelerating Frank-Wolfe Algorithm using Low-Dimensional and Adaptive Data Structures

    Full text link
    In this paper, we study the problem of speeding up a type of optimization algorithms called Frank-Wolfe, a conditional gradient method. We develop and employ two novel inner product search data structures, improving the prior fastest algorithm in [Shrivastava, Song and Xu, NeurIPS 2021]. * The first data structure uses low-dimensional random projection to reduce the problem to a lower dimension, then uses efficient inner product data structure. It has preprocessing time O~(ndω1+dn1+o(1))\tilde O(nd^{\omega-1}+dn^{1+o(1)}) and per iteration cost O~(d+nρ)\tilde O(d+n^\rho) for small constant ρ\rho. * The second data structure leverages the recent development in adaptive inner product search data structure that can output estimations to all inner products. It has preprocessing time O~(nd)\tilde O(nd) and per iteration cost O~(d+n)\tilde O(d+n). The first algorithm improves the state-of-the-art (with preprocessing time O~(d2n1+o(1))\tilde O(d^2n^{1+o(1)}) and per iteration cost O~(dnρ)\tilde O(dn^\rho)) in all cases, while the second one provides an even faster preprocessing time and is suitable when the number of iterations is small

    Zen: Near-Optimal Sparse Tensor Synchronization for Distributed DNN Training

    Full text link
    Distributed training is the de facto standard to scale up the training of Deep Neural Networks (DNNs) with multiple GPUs. The performance bottleneck of distributed training lies in communications for gradient synchronization. Recently, practitioners have observed sparsity in gradient tensors, suggesting the potential to reduce the traffic volume in communication and improve end-to-end training efficiency. Yet, the optimal communication scheme to fully leverage sparsity is still missing. This paper aims to address this gap. We first analyze the characteristics of sparse tensors in popular DNN models to understand the fundamentals of sparsity. We then systematically explore the design space of communication schemes for sparse tensors and find the optimal one. % We then find the optimal scheme based on the characteristics by systematically exploring the design space. We also develop a gradient synchronization system called Zen that approximately realizes it for sparse tensors. We demonstrate that Zen can achieve up to 5.09x speedup in communication time and up to 2.48x speedup in training throughput compared to the state-of-the-art methods

    Raspberry Pi Based Intelligent Wireless Sensor Node for Localized Torrential Rain Monitoring

    Get PDF
    Wireless sensor networks are proved to be effective in long-time localized torrential rain monitoring. However, the existing widely used architecture of wireless sensor networks for rain monitoring relies on network transportation and back-end calculation, which causes delay in response to heavy rain in localized areas. Our work improves the architecture by applying logistic regression and support vector machine classification to an intelligent wireless sensor node which is created by Raspberry Pi. The sensor nodes in front-end not only obtain data from sensors, but also can analyze the probabilities of upcoming heavy rain independently and give early warnings to local clients in time. When the sensor nodes send the probability to back-end server, the burdens of network transport are released. We demonstrate by simulation results that our sensor system architecture has potentiality to increase the local response to heavy rain. The monitoring capacity is also raised

    Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt

    Full text link
    Large Language Models (LLMs), armed with billions of parameters, exhibit exceptional performance across a wide range of Natural Language Processing (NLP) tasks. However, they present a significant computational challenge during inference, especially when deploying on common hardware such as single GPUs. As such, minimizing the latency of LLM inference by curtailing computational and memory requirements, though achieved through compression, becomes critically important. However, this process inevitably instigates a trade-off between efficiency and accuracy, as compressed LLMs typically experience a reduction in predictive precision. In this research, we introduce an innovative perspective: to optimize this trade-off, compressed LLMs require a unique input format that varies from that of the original models. Our findings indicate that the generation quality in a compressed LLM can be markedly improved for specific queries by selecting prompts with precision. Capitalizing on this insight, we introduce a prompt learning paradigm that cultivates an additive prompt over a compressed LLM to bolster their accuracy. Our empirical results imply that through our strategic prompt utilization, compressed LLMs can match, and occasionally even exceed, the accuracy of the original models. Moreover, we demonstrated that these learned prompts have a certain degree of transferability across various datasets, tasks, and compression levels. These insights shine a light on new possibilities for enhancing the balance between accuracy and efficiency in LLM inference. Specifically, they underscore the importance of judicious input editing to a compressed large model, hinting at potential advancements in scaling LLMs on common hardware

    Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model

    Full text link
    With the rapid growth in model size, fine-tuning the large pre-trained language model has become increasingly difficult due to its extensive memory usage. Previous works usually focus on reducing the number of trainable parameters in the network. While the model parameters do contribute to memory usage, the primary memory bottleneck during training arises from storing feature maps, also known as activations, as they are crucial for gradient calculation. Notably, neural networks are usually trained using stochastic gradient descent. We argue that in stochastic optimization, models can handle noisy gradients as long as the gradient estimator is unbiased with reasonable variance. Following this motivation, we propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance, which only requires storing the sub-sampled activations for calculating the gradient. Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones. By replacing the linear operation with our approximated one in transformers, we can achieve up to 2.7×\times peak memory reduction with almost no accuracy drop and enables up to 6.4×6.4\times larger batch size. Under the same hardware, WTA-CRS enables better down-streaming task performance by applying larger models and/or faster training speed with larger batch sizes

    Multi-Domain Fusion Graph Network for Semi-Supervised PolSAR Image Classification

    No full text
    The expensive acquisition of labeled data limits the practical use of supervised learning on polarimetric synthetic aperture radar (PolSAR) image analysis. Semi-supervised learning has attracted considerable attention as it can utilize few labeled data and very many unlabeled data. The scattering response of PolSAR data is strongly spatial distribution dependent, which provides rich information about land-cover properties. In this paper, we propose a semi-supervised learning method named multi-domain fusion graph network (MDFGN) to explore the multi-domain fused features including spatial domain and feature domain. Three major factors strengthen the proposed method for PolSAR image analysis. Firstly, we propose a novel sample selection criterion to select reliable unlabeled data for training set expansion. Multi-domain fusion graph is proposed to improve the feature diversity by extending the sample selection from the feature domain to the spatial-feature fusion domain. In this way, the selecting accuracy is improved. By few labeled data, very many accurate unlabeled data are obtained. Secondly, multi-model triplet encoder is proposed to achieve superior feature extraction. Equipped with triplet loss, limited training samples are fully utilized. For expanding training samples with different patch sizes, multiple models are obtained for the fused classification result acquisition. Thirdly, multi-level fusion strategy is proposed to apply different image patch sizes for different expanded training data and obtain the fused classification result. The experiments are conducted on Radarsat-2 and AIRSAR images. With few labeled samples (about 0.003–0.007%), the overall accuracy of the proposed method ranges between 94.78% and 99.24%, which demonstrates the proposed method’s robustness and excellence

    Deformable ConvNet with Aspect Ratio Constrained NMS for Object Detection in Remote Sensing Imagery

    No full text
    Convolutional neural networks (CNNs) have demonstrated their ability object detection of very high resolution remote sensing images. However, CNNs have obvious limitations for modeling geometric variations in remote sensing targets. In this paper, we introduced a CNN structure, namely deformable ConvNet, to address geometric modeling in object recognition. By adding offsets to the convolution layers, feature mapping of CNN can be applied to unfixed locations, enhancing CNNs’ visual appearance understanding. In our work, a deformable region-based fully convolutional networks (R-FCN) was constructed by substituting the regular convolution layer with a deformable convolution layer. To efficiently use this deformable convolutional neural network (ConvNet), a training mechanism is developed in our work. We first set the pre-trained R-FCN natural image model as the default network parameters in deformable R-FCN. Then, this deformable ConvNet was fine-tuned on very high resolution (VHR) remote sensing images. To remedy the increase in lines like false region proposals, we developed aspect ratio constrained non maximum suppression (arcNMS). The precision of deformable ConvNet for detecting objects was then improved. An end-to-end approach was then developed by combining deformable R-FCN, a smart fine-tuning strategy and aspect ratio constrained NMS. The developed method was better than a state-of-the-art benchmark in object detection without data augmentation
    corecore